Data Visualization

GVPT399F: Power, Politics, and Data

Data visualization

We will use data visualization to answer the following question:

Do cars with big engines use more fuel than cars with small engines?

EXERCISE

  1. What do you think is the answer to this question?


  1. How would you prove your answer? What information about cars would you need?

R4DS

This session will borrow (read: steal) heavily from Hadley Wickham’s R for Data Science book.

Source: R4DS

Data visualization: a critical skill

  • You can learn an incredible amount about your data from plotting it

  • Easier to digest the vast amount of information required to do data analysis from a plot than raw numbers

Skipping to the end

How did we do this?

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(colour = class)) + 
  geom_smooth(method = "lm") + 
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    panel.background = element_blank(),
    plot.title.position = "plot",
    plot.title = element_text(face = "bold")
  ) + 
  labs(
    title = "Engine displacement and highway miles per gallon",
    subtitle = "Values for seven different classes of cars",
    x = "Engine displacement (L)",
    y = "Highway miles per gallon"
  )